Handling online survey infiltrators
Editor’s note: Tim McCarthy is the general manager at Imperium.
For market researchers everywhere, tackling fraud has become a high-stakes battle to defend data integrity. Yet, even as organizations scramble to thwart increasingly sophisticated attempts to infiltrate online surveys – including the wholesale deployment of bots and click farms – determined fraudsters constantly seek out fresh ways to subvert the system, ruthlessly exploiting every vulnerability.
Data integrity in the marketing research industry
The impact is significant; it’s estimated that between 15 and 25% of all market research survey respondents are fraudulent. Our own research shows that survey fraud escalated sharply during the pandemic, with fraudulent survey responses nearly doubling at times in 2020 – a worrying development when interference even at the lower end of that estimate can deliver dangerously skewed results.
However, with professional fraudsters not only ramping up attempts to evade detection but also sharing their tactics on YouTube, it’s a problem that’s set to persist.
Further, just as fraudsters are raising the bar, restrictions on third-party access to personal data that can help identify fakes and dupes are gaining ground. For example, Apple recently introduced its iCloud Private Relay service which can be used to send all browser info/traffic through Apple’s relays, returning it through a temporary IP, often shared by many others in the same region. Ironically, the very same processes that are designed to safeguard ordinary people from intrusive data harvesting are also serving to obscure the identities and intentions of bad actors.
Taking a measured approach to data integrity
At the same time, with sample scarcity being where it is, it’s more important than ever that market researchers don’t lose sight of the need to provide genuine respondents with a rewarding survey experience. Even with the focus firmly on preserving data integrity, there’s a balance to be struck between taking essential precautions to eject cheaters and providing a frictionless journey for genuine panelists.
It requires a holistic approach. Researchers must be aware of dozens of individual survey elements that have the potential to affect data quality – as well as understanding how to manage and mitigate these threats while promoting process integrity.
1. Survey design.
Some QA can be addressed through meticulous study design. Surveys should be engaging, relevant, clear and concise – those that can be completed in under 20 minutes are most likely to result in a completely usable data set. It’s best to incorporate a range of question types designed to identify poor respondents that are targeted and fit for purpose, without going overboard. Open-end, grid, low incidence and differing response questions will help weed out cheaters. Data reviews should be conducted consistently while fielding, ideally in real-time through automation.
2. Checks and balances.
Pre-survey checks are necessary to filter out obvious frauds and dupes. But, even when clear fraudulent respondents are removed prior to, or upon survey entry, a further ~10% still need to be removed manually post-completion due to inappropriate survey data/behavior. It’s relatively easy to filter out the fraudsters whose data is clearly and abundantly poor. But experience shows that flagging less-commonly monitored details – such as respondents re-entering the same phrase/response, copying and pasting OE responses or gaming LOI calculations by pausing for long periods of time before completing – can have a dramatic effect on the quality of the respondent pool.
3. Keep it real.
Build your quality checks into the survey itself and ensure all/most are related to the survey content. Creating unrelated quality check pre-screeners and/or setting multiple, off-topic trap questions can backfire and unnecessarily extend a survey’s length. Instead deploy actual survey questions with anomalistic/inappropriate behaviors flagged. Really obvious red herring questions can often be identified by bogus respondents and will sometimes confuse or frustrate real respondents – especially if they’re inserted towards the end of a long survey when attentiveness isn’t at its peak for any respondent.
4. Recalibrate expectations.
Real people aren’t perfect – they can be distracted, inattentive and contrary at times, while still being valid (and valuable) respondents. Rather than ejecting respondents at the first incidence of concern, it’s important to find secondary data points – extracted from other questions or passive checks – to confirm suspicions of poor quality. Identifying a fraudster or malicious respondent is not about catching a single poor response, but rather about recognizing multiple flags consistent with cheating, throughout the course of a survey. Sophisticated cheaters can appear to be the perfect respondents at first glance – completing in precisely the allocated time, for example. You’ll need to dig deeper into the data and use passive data points.
5. Technology can help.
In-survey automated checks that utilize machine learning (ML) models can help quickly spot anomalous behaviors, without impacting the panelist’s survey experience. Intelligent ML models will process inputs to create an automated feedback loop that makes it easier not only to improve extant survey data but to predict fraudulent or unusual behavior in the future. It’s unlikely that ML will ever completely replace human intuition but implementing effective automation and ML frees up time for skilled analysts to focus on more productive tasks.
The AI paradox
Just as artificial intelligence has offered fresh tools for tackling fraud, it’s also provided fraudsters with opportunities to execute increasingly creative encroachments.
Highly sophisticated automated scripts can slice through surveys, flooding them with bad respondents, for example, while advanced AI can analyze OEs to synthesize a closely related answer that may pass superficial inspection. This is why gathering supporting data points to work out what respondents are doing on a page-by-page basis is crucial. As fraud becomes more automated, our response levels must match it in both speed and accuracy.
Mitigating fraud is a long-term commitment. Cheaters will always try to find new ways to circumvent tricks and traps but if the cost-benefit doesn’t add up, the rewards simply won’t justify the effort. By forcing fraudsters to spend more time and energy gaming a survey than its monetary worth – while also providing genuine respondents with a satisfying experience that elicits generous engagement – market researchers could well tip the quality balance in their favor for good.